Introduction

TxRegQuery addresses exploration of transcriptional regulatory networks by integrating data on eQTL, digital genomic footprinting (DGF), DnaseI hypersensitivity binding data (DHS), and transcription factor binding site (TFBS) data. Owing to the volume of emerging tissue-specific data, special data modalities are used.

Managing heterogeneous file content with mongodb

Querying the txregnet database

The README.md for this package describes how to populate a MongoDB instance with demonstrative data. We focus on the CRAN package mongolite as the interface to this data.

The connection

suppressPackageStartupMessages({
library(TxRegInfra2)
library(mongolite)
library(TnT)
library(EnsDb.Hsapiens.v75)
library(BiocParallel)
register(SerialParam())
})
con1 = mongo(url=URL_txregLocal(), 
   db="txregnet", collection="Lung_allpairs_v7_eQTL")
names(con1)
##  [1] "aggregate"  "count"      "disconnect" "distinct"   "drop"      
##  [6] "export"     "find"       "import"     "index"      "info"      
## [11] "insert"     "iterate"    "mapreduce"  "remove"     "rename"    
## [16] "replace"    "run"        "update"
con1$find(limit=1)
##             gene_id       variant_id tss_distance ma_samples ma_count       maf
## 1 ENSG00000227232.4 1_860956_C_G_b37       831403         11       11 0.0143979
##   pval_nominal     slope slope_se     qvalue chr snp_pos A1 A2 build
## 1   0.00123033 -0.921918 0.282688 0.08029171   1  860956  C  G   b37

Our aim is to produce tools based on Bioconductor idioms that answer questions about transcription regulation on the basis of documents stored in a MongoDB database.

There is not much explicit reflectance in the mongolite API. The following is not part of the formal API for the mongo package, but shows that the mongo instance may be queried for information about its origins.

try(parent.env(con1)$orig[c("name", "db", "url")])
## $name
## [1] "Lung_allpairs_v7_eQTL"
## 
## $db
## [1] "txregnet"
## 
## $url
## [1] "mongodb://127.0.0.1:27017"

Queries and aggregation

MongoDB is a schemaless technology. A ‘database’ in MongoDB is a family of named ‘collections’, and collections can be searched using the ‘find’ operation.

We can only use this package on systems where the mongod service is running and accepting connections.

We can get a list of collections in the database as follows.

con1$run('{"listCollections":1}')$cursor$firstBatch[,"name"]
## [1] "vjc2"                      "vjc1"                     
## [3] "fPlacenta_DS20346_hg19_FP" "fLung_DS14724_hg19_FP"    
## [5] "M5946_1_02_tf"             "ENCFF001WBZ_hg19_HS"      
## [7] "Lung_allpairs_v7_eQTL"

For a single record from a given collection:

mongo(url=URL_txregLocal(), db="txregnet", 
   collection="Lung_allpairs_v7_eQTL")$find(limit=1)
##             gene_id       variant_id tss_distance ma_samples ma_count       maf
## 1 ENSG00000227232.4 1_860956_C_G_b37       831403         11       11 0.0143979
##   pval_nominal     slope slope_se     qvalue chr snp_pos A1 A2 build
## 1   0.00123033 -0.921918 0.282688 0.08029171   1  860956  C  G   b37

Queries can be composed using JSON. We have a tool to generate queries that employ the mongodb aggregation method. Here we demonstrate this by computing, for each chromosome, the count and minimum values of the footprint statistic on a sample of placental cells.

m1 = mongo(url = URL_txregLocal(), db = "txregnet",  collection="fPlacenta_DS20346_hg19_FP")
newagg = makeAggregator( by="chr", vbl="stat", op="$min", opname="min")

The JSON layout of this aggregating query is

[
  {
    "$group": {
      "_id": ["$chr"],
      "count": {
        "$sum": [1]
      },
      "min": {
        "$min": ["$stat"]
      }
    }
  }
] 

Invocation returns a data frame:

head(m1$aggregate(newagg))
##     _id  count        min
## 1 chr10  72826 0.00870233
## 2  chr4  72074 0.00986576
## 3  chr5  83737 0.00674804
## 4  chr9  66610 0.01753630
## 5 chr17  67314 0.01207180
## 6  chr6 100439 0.01964700

An integrative container

We need to bind the metadata and information about the mongodb. NB: We may want to utilize MultiAssayExperiment.

Sample metadata

The following turns a very ad hoc filtering of the collection names into a DataFrame.

cd = TxRegInfra2::basicColData.tiny
head(cd,2)
## DataFrame with 2 rows and 3 columns
##                              base        type          mid
##                       <character> <character>  <character>
## ENCFF001WBZ_hg19_HS   ENCFF001WBZ          HS         hg19
## fLung_DS14724_hg19_FP       fLung          FP DS14724_hg19

Extended RaggedExperiment

rme0 = RaggedMongoExpt(con1, colData=cd)
rme1 = rme0[, which(cd$type=="FP")]

A key method in development is subsetting the archive by genomic coordinates. This is accomplished with sbov, which is an early implementation of the (planned) subsetByOverlaps generic.

si = GenomeInfoDb::Seqinfo(genome="hg19")["chr17"] # to fix query genome
myg = GRanges("chr17", IRanges(38.07e6,38.09e6), seqinfo=si)
s1 = sbov(rme1, myg, simplify=FALSE)
## ..
s1
## class: RaggedExperiment 
## dim: 111 2 
## assays(3): chr id stat
## rownames: NULL
## colnames(2): fLung_DS14724_hg19_FP fPlacenta_DS20346_hg19_FP
## colData names(6): base type ... type mid
#dim(sa <- sparseAssay(s1, 3))  # compact gives segfault
sa = as(s1, "GRangesList")
sa
## GRangesList object of length 2:
## $fLung_DS14724_hg19_FP
## GRanges object with 30 ranges and 3 metadata columns:
##        seqnames            ranges strand |         chr          id      stat
##           <Rle>         <IRanges>  <Rle> | <character> <character> <numeric>
##    [1]    chr17 38083253-38083260      * |       chr17   id-454581  0.761905
##    [2]    chr17 38083323-38083335      * |       chr17   id-454582  0.944444
##    [3]    chr17 38083535-38083541      * |       chr17   id-454583  0.805556
##    [4]    chr17 38083577-38083590      * |       chr17   id-454584  0.943787
##    [5]    chr17 38083593-38083606      * |       chr17   id-454585  0.933258
##    ...      ...               ...    ... .         ...         ...       ...
##   [26]    chr17 38084078-38084088      * |       chr17   id-454606  0.829365
##   [27]    chr17 38084097-38084104      * |       chr17   id-454607  0.800000
##   [28]    chr17 38084110-38084150      * |       chr17   id-454608  0.880682
##   [29]    chr17 38084160-38084169      * |       chr17   id-454609  0.533333
##   [30]    chr17 38084924-38084952      * |       chr17   id-454610  0.890476
##   -------
##   seqinfo: 1 sequence from hg19 genome
## 
## $fPlacenta_DS20346_hg19_FP
## GRanges object with 81 ranges and 3 metadata columns:
##        seqnames            ranges strand |         chr          id      stat
##           <Rle>         <IRanges>  <Rle> | <character> <character> <numeric>
##    [1]    chr17 38073443-38073472      * |       chr17   id-600855  0.657201
##    [2]    chr17 38074829-38074840      * |       chr17   id-600856  0.777273
##    [3]    chr17 38074866-38074872      * |       chr17   id-600857  0.916667
##    [4]    chr17 38074882-38074908      * |       chr17   id-600858  0.817949
##    [5]    chr17 38074913-38074920      * |       chr17   id-600859  0.874060
##    ...      ...               ...    ... .         ...         ...       ...
##   [77]    chr17 38089527-38089535      * |       chr17   id-600931  0.894269
##   [78]    chr17 38089573-38089589      * |       chr17   id-600932  0.790179
##   [79]    chr17 38089599-38089606      * |       chr17   id-600933  0.536646
##   [80]    chr17 38089671-38089677      * |       chr17   id-600934  0.582996
##   [81]    chr17 38089691-38089728      * |       chr17   id-600935  0.642579
##   -------
##   seqinfo: 1 sequence from hg19 genome

Visualizing coincidence

ormm = txmodels("ORMDL3", plot=FALSE, name="ORMDL3")
## Loading required namespace: biovizBase
## Loading required namespace: Gviz
#sar = strsplit(rownames(sa), ":|-")
dat = unlist(sa)
dat$score = 1-dat$stat
dat = split(dat, names(dat))
dat[[1]]$value = dat[[1]]$score # for TnT
dat[[2]]$value = dat[[2]]$score
d1 = dat[[1]]
width(d1) = 1
d2 = dat[[2]]
width(d2) = 1
names(d1) = seq_len(length(d1)) # for TnT, can't have duplicated rownames
names(d2) = seq_len(length(d2))
pt1 = PinTrack(d1)
pt2 = PinTrack(d2)
data(tnt_genetrack_hg19)
data(tnt_txtrack_hg19)
vr = GRanges("chr17", IRanges(38.05e6, width=50000))
TnTGenome(list(pt1,pt2,tnt_genetrack_hg19,tnt_txtrack_hg19), view.range=vr)

Higher-level work with sbov

Building annotated GRanges for a selected target interval

We begin with three ‘single-concept’ assays with relevance to lung genomics. The v7 GTEx lung eQTL data, an encode DnaseI narrowPeak report on lung fibroblasts, and a digital genomic footprint report for fetal lung.

lname_eqtl = "Lung_allpairs_v7_eQTL"
lname_dhs = "ENCFF001WBZ_hg19_HS" # see dnmeta, fibroblast of lung
lname_fp = "fLung_DS14724_hg19_FP"
si17 = GenomeInfoDb::Seqinfo(genome="hg19")["chr17"]
si17n = si17
GenomeInfoDb::seqlevelsStyle(si17n) = "NCBI"
s1 = sbov(rme0[,lname_eqtl], GRanges("17", IRanges(38.06e6, 38.15e6),
    seqinfo=si17n))
## .
s2 = sbov(rme0[,lname_dhs], GRanges("chr17", IRanges(38.06e6, 38.15e6),
   seqinfo=si17))
## .
s3 = sbov(rme0[,lname_fp], GRanges("chr17", IRanges(38.06e6, 38.15e6),
   seqinfo=si17))
## .

Now we have annotated GRanges for each assay. The eQTL data in part are:

names(mcols(s1))
##  [1] "gene_id"      "variant_id"   "tss_distance" "ma_samples"   "ma_count"    
##  [6] "maf"          "pval_nominal" "slope"        "slope_se"     "qvalue"      
## [11] "chr"          "snp_pos"      "A1"           "A2"           "build"       
## [16] "origin"
head(s1[, c("gene_id", "variant_id", "maf", "pval_nominal")])
## GRanges object with 6 ranges and 4 metadata columns:
##       seqnames    ranges strand |            gene_id          variant_id
##          <Rle> <IRanges>  <Rle> |        <character>         <character>
##   [1]       17  38061054      * |  ENSG00000266469.1 17_38061054_G_A_b37
##   [2]       17  38061439      * |  ENSG00000161395.8 17_38061439_T_C_b37
##   [3]       17  38061439      * | ENSG00000073605.14 17_38061439_T_C_b37
##   [4]       17  38061439      * |  ENSG00000172057.5 17_38061439_T_C_b37
##   [5]       17  38061439      * |  ENSG00000167914.6 17_38061439_T_C_b37
##   [6]       17  38062196      * |  ENSG00000161395.8 17_38062196_G_A_b37
##             maf pval_nominal
##       <numeric>    <numeric>
##   [1] 0.0195822  7.72192e-04
##   [2] 0.4203660  3.99212e-04
##   [3] 0.4203660  6.87714e-10
##   [4] 0.4203660  1.08337e-10
##   [5] 0.4203660  2.15704e-10
##   [6] 0.4188480  3.09568e-04
##   -------
##   seqinfo: 1 sequence from GRCh37.p13 genome

The names of genes and variants used here are cumbersome – symbols and rsids are preferable.

addsyms = function(x, EnsDb=EnsDb.Hsapiens.v75::EnsDb.Hsapiens.v75) {
  ensids = gsub("\\..*", "", x$gene_id) # remove post period
  gns = genes(EnsDb)
  x$symbol = gns[ensids]$symbol
  x
}
s1 = addsyms(s1)

Note that it is possible to retrieve rsids for the SNPs by address. But this is a slow operation involving a huge SNPlocs package that we do not want to work with directly for this vignette.

> snpsByOverlaps(SNPlocs.Hsapiens.dbSNP144.GRCh37, s1b)
UnstitchedGPos object with 265 positions and 2 metadata columns:
        seqnames       pos strand |   RefSNP_id alleles_as_ambig
           <Rle> <integer>  <Rle> | <character>      <character>
    [1]       17  38061054      * |  rs36049276                R
    [2]       17  38061439      * |   rs4795399                Y
    [3]       17  38062196      * |   rs2305480                R
    [4]       17  38062217      * |   rs2305479                Y
    [5]       17  38062503      * |  rs35104165                Y
    ...      ...       ...    ... .         ...              ...
  [261]       17  38149258      * |  rs58212353                K
  [262]       17  38149350      * |   rs8073254                V
  [263]       17  38149411      * |  rs34648856                R
  [264]       17  38149724      * |   rs3785549                Y
  [265]       17  38149727      * |   rs3785550                H
  -------
  seqinfo: 25 sequences (1 circular) from GRCh37.p13 genome

A bipartite graph for eQTL-gene relationships

The object s1 computed above is available as demo_eQTL_granges. We convert it to a graph via

library(graph)
## 
## Attaching package: 'graph'
## The following object is masked from 'package:Biostrings':
## 
##     complement
g1 = sbov_to_graphNEL(demo_eQTL_granges)
g1
## A graphNEL graph with directed edges
## Number of Nodes = 312 
## Number of Edges = 693

Nodes are SNPs and genes, edges are present when the resource (in this case the GTEx lung study) declares an association (in this case, an FDR for SNP-gene association not exceeding 0.10.) The graph library includes functions for creation of incidence matrices from graphs, and vice versa.

Connecting eQTL-SNPs via DHS and DGF

Given the GRanges representations for sbov results, we can use overlap computations to conveniently identify relationships between eQTL SNPs, genes, and hypersensitivity or footprint regions.

We use sbov_output_HS as a persistent instance of s2 computed above.

seqlevelsStyle(demo_eQTL_granges) = "UCSC" # Fails xmas 2020
seqlevels(demo_eQTL_granges) = "chr17"
fo1 = findOverlaps(demo_eQTL_granges, sbov_output_HS)
fo1 
## Hits object with 11 hits and 0 metadata columns:
##        queryHits subjectHits
##        <integer>   <integer>
##    [1]       205           2
##    [2]       206           2
##    [3]       207           2
##    [4]       458           9
##    [5]       459           9
##    [6]       460           9
##    [7]       461           9
##    [8]       462           9
##    [9]       463           9
##   [10]       464           9
##   [11]       465           9
##   -------
##   queryLength: 693 / subjectLength: 11
eq_by_hs = split(demo_eQTL_granges[queryHits(fo1)],
   subjectHits(fo1))
eq_by_hs
## GRangesList object of length 2:
## $`2`
## GRanges object with 3 ranges and 17 metadata columns:
##       seqnames    ranges strand |            gene_id          variant_id
##          <Rle> <IRanges>  <Rle> |           <factor>         <character>
##   [1]    chr17  38085385      * | ENSG00000073605.14 17_38085385_A_C_b37
##   [2]    chr17  38085385      * | ENSG00000172057.5  17_38085385_A_C_b37
##   [3]    chr17  38085385      * | ENSG00000264968.1  17_38085385_A_C_b37
##       tss_distance ma_samples  ma_count       maf pval_nominal     slope
##          <integer>  <integer> <integer> <numeric>    <numeric> <numeric>
##   [1]        10482        172       207  0.270235  3.71880e-08  0.161276
##   [2]         1531        172       207  0.270235  3.07153e-09  0.193448
##   [3]         1390        172       207  0.270235  4.94800e-04 -0.230682
##        slope_se      qvalue       chr   snp_pos       A1       A2    build
##       <numeric>   <numeric> <integer> <integer> <factor> <factor> <factor>
##   [1] 0.0285803 9.35455e-06        17  38085385        A        C      b37
##   [2] 0.0317052 9.35524e-07        17  38085385        A        C      b37
##   [3] 0.0655330 4.00269e-02        17  38085385        A        C      b37
##                      origin        symbol
##                 <character>   <character>
##   [1] Lung_allpairs_v7_eQTL         GSDMB
##   [2] Lung_allpairs_v7_eQTL        ORMDL3
##   [3] Lung_allpairs_v7_eQTL RP11-387H17.4
##   -------
##   seqinfo: 1 sequence from hg19 genome
## 
## $`9`
## GRanges object with 8 ranges and 17 metadata columns:
##       seqnames    ranges strand |            gene_id            variant_id
##          <Rle> <IRanges>  <Rle> |           <factor>           <character>
##   [1]    chr17  38115299      * | ENSG00000167914.6    17_38115299_C_T_b37
##   [2]    chr17  38115299      * | ENSG00000188895.7    17_38115299_C_T_b37
##   [3]    chr17  38115429      * | ENSG00000073605.14 17_38115429_C_CTG_b37
##   [4]    chr17  38115429      * | ENSG00000172057.5  17_38115429_C_CTG_b37
##   [5]    chr17  38115429      * | ENSG00000167914.6  17_38115429_C_CTG_b37
##   [6]    chr17  38115430      * | ENSG00000073605.14   17_38115430_A_C_b37
##   [7]    chr17  38115430      * | ENSG00000172057.5    17_38115430_A_C_b37
##   [8]    chr17  38115430      * | ENSG00000167914.6    17_38115430_A_C_b37
##       tss_distance ma_samples  ma_count       maf pval_nominal     slope
##          <integer>  <integer> <integer> <numeric>    <numeric> <numeric>
##   [1]        -3927         59        62 0.0809399  1.73014e-06 -0.530695
##   [2]      -163252         59        62 0.0809399  5.12527e-04 -0.197247
##   [3]        40526        267       350 0.4569190  3.30451e-06  0.126418
##   [4]        31575        267       350 0.4569190  6.33228e-06  0.137325
##   [5]        -3797        267       350 0.4569190  1.04416e-17 -0.499680
##   [6]        40527        267       350 0.4569190  3.30451e-06  0.126418
##   [7]        31576        267       350 0.4569190  6.33228e-06  0.137325
##   [8]        -3796        267       350 0.4569190  1.04416e-17 -0.499680
##        slope_se      qvalue       chr   snp_pos       A1       A2    build
##       <numeric>   <numeric> <integer> <integer> <factor> <factor> <factor>
##   [1] 0.1088720 3.07750e-04        17  38115299        C      T        b37
##   [2] 0.0561895 4.11570e-02        17  38115299        C      T        b37
##   [3] 0.0266958 5.47361e-04        17  38115429        C      CTG      b37
##   [4] 0.0299020 9.76300e-04        17  38115429        C      CTG      b37
##   [5] 0.0549122 8.93031e-15        17  38115429        C      CTG      b37
##   [6] 0.0266958 5.47361e-04        17  38115430        A      C        b37
##   [7] 0.0299020 9.76300e-04        17  38115430        A      C        b37
##   [8] 0.0549122 8.93031e-15        17  38115430        A      C        b37
##                      origin      symbol
##                 <character> <character>
##   [1] Lung_allpairs_v7_eQTL       GSDMA
##   [2] Lung_allpairs_v7_eQTL        MSL1
##   [3] Lung_allpairs_v7_eQTL       GSDMB
##   [4] Lung_allpairs_v7_eQTL      ORMDL3
##   [5] Lung_allpairs_v7_eQTL       GSDMA
##   [6] Lung_allpairs_v7_eQTL       GSDMB
##   [7] Lung_allpairs_v7_eQTL      ORMDL3
##   [8] Lung_allpairs_v7_eQTL       GSDMA
##   -------
##   seqinfo: 1 sequence from hg19 genome

This shows that there are two DHS sites that overlap with SNPs showing eQTL associations with various genes.

For the footprint data, we have:

fo2 = findOverlaps(demo_eQTL_granges, sbov_output_FP)
fo2 
## Hits object with 4 hits and 0 metadata columns:
##       queryHits subjectHits
##       <integer>   <integer>
##   [1]       348          44
##   [2]       349          44
##   [3]       613         101
##   [4]       614         101
##   -------
##   queryLength: 693 / subjectLength: 107
eq_by_fp = split(demo_eQTL_granges[queryHits(fo2)],
   subjectHits(fo2))
eq_by_fp
## GRangesList object of length 2:
## $`44`
## GRanges object with 2 ranges and 17 metadata columns:
##       seqnames    ranges strand |           gene_id          variant_id
##          <Rle> <IRanges>  <Rle> |          <factor>         <character>
##   [1]    chr17  38109075      * | ENSG00000172057.5 17_38109075_T_C_b37
##   [2]    chr17  38109075      * | ENSG00000167914.6 17_38109075_T_C_b37
##       tss_distance ma_samples  ma_count       maf pval_nominal     slope
##          <integer>  <integer> <integer> <numeric>    <numeric> <numeric>
##   [1]        25221        182       203  0.285915  3.35618e-04  0.135266
##   [2]       -10151        182       203  0.285915  2.23245e-14 -0.555363
##        slope_se      qvalue       chr   snp_pos       A1       A2    build
##       <numeric>   <numeric> <integer> <integer> <factor> <factor> <factor>
##   [1] 0.0373061 2.93258e-02        17  38109075        T        C      b37
##   [2] 0.0693376 1.36746e-11        17  38109075        T        C      b37
##                      origin      symbol
##                 <character> <character>
##   [1] Lung_allpairs_v7_eQTL      ORMDL3
##   [2] Lung_allpairs_v7_eQTL       GSDMA
##   -------
##   seqinfo: 1 sequence from hg19 genome
## 
## $`101`
## GRanges object with 2 ranges and 17 metadata columns:
##       seqnames    ranges strand |            gene_id          variant_id
##          <Rle> <IRanges>  <Rle> |           <factor>         <character>
##   [1]    chr17  38137033      * | ENSG00000008838.13 17_38137033_A_G_b37
##   [2]    chr17  38137033      * | ENSG00000167914.6  17_38137033_A_G_b37
##       tss_distance ma_samples  ma_count       maf pval_nominal     slope
##          <integer>  <integer> <integer> <numeric>    <numeric> <numeric>
##   [1]       -80435        274       359  0.468668  7.71652e-04  0.123970
##   [2]        17807        274       359  0.468668  5.75457e-33  0.679408
##        slope_se      qvalue       chr   snp_pos       A1       A2    build
##       <numeric>   <numeric> <integer> <integer> <factor> <factor> <factor>
##   [1] 0.0365067 5.65350e-02        17  38137033        A        G      b37
##   [2] 0.0504649 1.32846e-29        17  38137033        A        G      b37
##                      origin      symbol
##                 <character> <character>
##   [1] Lung_allpairs_v7_eQTL       MED24
##   [2] Lung_allpairs_v7_eQTL       GSDMA
##   -------
##   seqinfo: 1 sequence from hg19 genome

Relationships to FIMO-based TFBS

We have a small number of cloud-resident FIMO search results through the TFutils package.

library(TFutils)
data(demo_fimo_granges)
seqlevelsStyle(demo_eQTL_granges) = "UCSC"
lapply(demo_fimo_granges, lapply, function(x) 
   subsetByOverlaps(demo_eQTL_granges, x))
## $VDR
## $VDR$`chr17:38070000-38090000`
## GRanges object with 0 ranges and 17 metadata columns:
##    seqnames    ranges strand |  gene_id  variant_id tss_distance ma_samples
##       <Rle> <IRanges>  <Rle> | <factor> <character>    <integer>  <integer>
##     ma_count       maf pval_nominal     slope  slope_se    qvalue       chr
##    <integer> <numeric>    <numeric> <numeric> <numeric> <numeric> <integer>
##      snp_pos       A1       A2    build      origin      symbol
##    <integer> <factor> <factor> <factor> <character> <character>
##   -------
##   seqinfo: 1 sequence from hg19 genome
## 
## 
## $POU2F1
## $POU2F1$`chr17:38070000-38090000`
## GRanges object with 8 ranges and 17 metadata columns:
##       seqnames    ranges strand |            gene_id             variant_id
##          <Rle> <IRanges>  <Rle> |           <factor>            <character>
##   [1]    chr17  38073968      * | ENSG00000161395.8     17_38073968_G_C_b37
##   [2]    chr17  38073968      * | ENSG00000073605.14    17_38073968_G_C_b37
##   [3]    chr17  38073968      * | ENSG00000172057.5     17_38073968_G_C_b37
##   [4]    chr17  38073968      * | ENSG00000167914.6     17_38073968_G_C_b37
##   [5]    chr17  38076198      * | ENSG00000161395.8  17_38076198_TATA_T_b37
##   [6]    chr17  38076198      * | ENSG00000073605.14 17_38076198_TATA_T_b37
##   [7]    chr17  38076198      * | ENSG00000172057.5  17_38076198_TATA_T_b37
##   [8]    chr17  38076198      * | ENSG00000167914.6  17_38076198_TATA_T_b37
##       tss_distance ma_samples  ma_count       maf pval_nominal      slope
##          <integer>  <integer> <integer> <numeric>    <numeric>  <numeric>
##   [1]       220918        251       321  0.419060  2.46542e-04  0.0992119
##   [2]         -935        251       321  0.419060  6.84037e-10  0.1727410
##   [3]        -9886        251       321  0.419060  4.19548e-11  0.2056080
##   [4]       -45258        251       321  0.419060  4.86746e-10 -0.3888670
##   [5]       223148        285       378  0.497368  4.67230e-06  0.1203940
##   [6]         1295        285       378  0.497368  3.28042e-15  0.2116420
##   [7]        -7656        285       378  0.497368  1.62754e-14  0.2310670
##   [8]       -43028        285       378  0.497368  1.64858e-08 -0.3465760
##        slope_se      qvalue       chr   snp_pos       A1       A2    build
##       <numeric>   <numeric> <integer> <integer> <factor> <factor> <factor>
##   [1] 0.0267552 2.28109e-02        17  38073968     G           C      b37
##   [2] 0.0271367 2.32092e-07        17  38073968     G           C      b37
##   [3] 0.0300749 1.71834e-08        17  38073968     G           C      b37
##   [4] 0.0605305 1.69013e-07        17  38073968     G           C      b37
##   [5] 0.0258368 7.45086e-04        17  38076198     TATA        T      b37
##   [6] 0.0255291 2.19298e-12        17  38076198     TATA        T      b37
##   [7] 0.0286818 1.01079e-11        17  38076198     TATA        T      b37
##   [8] 0.0598010 4.42487e-06        17  38076198     TATA        T      b37
##                      origin      symbol
##                 <character> <character>
##   [1] Lung_allpairs_v7_eQTL       PGAP3
##   [2] Lung_allpairs_v7_eQTL       GSDMB
##   [3] Lung_allpairs_v7_eQTL      ORMDL3
##   [4] Lung_allpairs_v7_eQTL       GSDMA
##   [5] Lung_allpairs_v7_eQTL       PGAP3
##   [6] Lung_allpairs_v7_eQTL       GSDMB
##   [7] Lung_allpairs_v7_eQTL      ORMDL3
##   [8] Lung_allpairs_v7_eQTL       GSDMA
##   -------
##   seqinfo: 1 sequence from hg19 genome